The first two chunks of this r markdown file after the r setup allow for plot zooming, but it also means that the html file must be opened in a browser to view the document properly. When it knits in RStudio the preview will appear empty but the html when opened in a browser will have all the info and you can click on each plot to Zoom in on it.
A few notes about this script.
If you are running this with the 2022-2023 data make sure you download the whole (OSM_2022-2023 GitHub repository)[https://github.com/ACMElabUvic/OSM_2022-2023] from the ACMElabUvic GitHub. This will ensure you have all the files, data, and proper folder structure you will need to run this code and associated analyses.
Also make sure you open RStudio through the R project (OSM_2022-2023.Rproj) this will automatically set your working directory to the correct place (wherever you saved the repository) and ensure you don’t have to change the file paths for some of the data.
Lastly, if you are looking to adapt this code for a future year of data, you will want to ensure you have run the 1_ACME_camera_script_9-2-2024.R or .Rmd with your data as there is much data formatting, cleaning, and restructuring that has to be done before this code will work. Helpful note: The files are numbered in the order they are used for this analysis.
If you have question please email the most recent author, currently
Marissa A. Dyck
Postdoctoral research fellow
University of Victoria
School of Environmental Studies
Email: marissadyck17@gmail.com
(update/add authors as needed)
If you don’t already have the following packages installed, use the code below to install them.
install.packages('tidyverse')
install.packages('PerformanceAnalytics')
install.packages('Hmisc')
Then load the packages to your library.
library(tidyverse) # data tidying, visualization, and much more; this will load all tidyverse packages, can see complete list using tidyverse_packages()
library(PerformanceAnalytics) #Used to generate a correlation plot
library(Hmisc) # used to generate histograms for all variables in data frame
To do any analysis with the detection data from the OSM arrays, we will want to pair it with the covariate data which has human factors indices (HFI) and landcover data (VEG) for each site. There are a lot of covariates/features in these datasets that need to be grouped together to be usable, which is what this script covers.
Let’s read in the covariate data for all 6 LUs (outputs from the 2021-2022 and 2022-2023 1_ACME_camera_script_9-2-2024.Rmd). We’ve copied the 2021-2022 data from the OSM_2021-2022 repository and saved it to the processed folder so we can read in both data files with the same file path.
# model covariates (merged HFI and VEG data from the ACME_camera_script_9-2-2024.R or .Rmd)
covariates <- file.path('data/processed',
c('OSM_covariates_2022.csv',
'OSM_covariates_2021.csv')) %>%
map(~.x %>%
read_csv(.,
# set the column types to read in correctly
col_types = cols(array = col_factor(),
camera = col_factor(),
site = col_factor(),
buff_dist = col_factor(),
.default = col_number()))) %>%
# give names to each data frame in list
purrr::set_names('covs_2022',
'covs_2021') # R doesn't like when they are just numbers, you can make it work but it's annoying to call the data frame later so I've called them covs_year
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
# check variable structure
str(covariates)
## List of 2
## $ covs_2022: spc_tbl_ [3,100 Ă— 119] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ array : Factor w/ 4 levels "LU13","LU15",..: 1 1 1 1 1 1 1 1 1 1 ...
## ..$ camera : Factor w/ 96 levels "18","15","03",..: 1 2 3 4 5 6 7 8 9 10 ...
## ..$ site : Factor w/ 155 levels "LU13_18","LU13_15",..: 1 2 3 4 5 6 7 8 9 10 ...
## ..$ buff_dist : Factor w/ 20 levels "250","500","750",..: 1 1 1 1 1 1 1 1 1 1 ...
## ..$ vegetated_edge_roads : num [1:3100] 0 0.0858 0 0 0 ...
## ..$ harvest_area : num [1:3100] 0 0 0.687 0.337 0 ...
## ..$ road_gravel_1l : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ conventional_seismic : num [1:3100] 0 0.03277 0 0.00889 0.01144 ...
## ..$ tame_pasture : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ pipeline : num [1:3100] 0 0.068 0 0 0.0301 ...
## ..$ road_gravel_2l : num [1:3100] 0 0 0 0 0 ...
## ..$ trail : num [1:3100] 0.00588 0.0028 0 0.00196 0 ...
## ..$ well_bitumen : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rough_pasture : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_aband : num [1:3100] 0 0 0 0 0.0322 ...
## ..$ road_unclassified : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ crop : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ low_impact_seismic : num [1:3100] 0 0 0 0 0.0523 ...
## ..$ clearing_unknown : num [1:3100] 0.0923 0.0697 0 0 0 ...
## ..$ cultivation_abandoned : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_undiv_2l : num [1:3100] 0 0.0174 0 0 0 ...
## ..$ road_unimproved : num [1:3100] 0 0 0 0 0 ...
## ..$ truck_trail : num [1:3100] 0 0 0 0.0139 0 ...
## ..$ dugout : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_undiv_1l : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_gas : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ vegetated_edge_railways : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ harvest_area_white_zone : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ country_residence : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ borrowpit_dry : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rural_residence : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ borrowpit_wet : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ borrowpits : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ grvl_sand_pit : num [1:3100] 0 0.0873 0 0 0 ...
## ..$ ris_reclaimed_temp : num [1:3100] 0 0.0477 0 0 0 ...
## ..$ ris_clearing_unknown : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_drainage : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_mines_oilsands : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_overburden_dump : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_facility_operations : num [1:3100] 0 0 0 0 0 ...
## ..$ transmission_line : num [1:3100] 0.0642 0 0 0 0.091 ...
## ..$ ris_tailing_pond : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ clearing_wellpad_unconfirmed: num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ mines_oilsands : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_soil_replaced : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_1l : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_oilsands_rms : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_facility_unknown : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_borrowpits : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_transmission_line : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_soil_salvaged : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_road : num [1:3100] 0 0 0 0 0 ...
## ..$ ris_plant : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ urban_residence : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ facility_other : num [1:3100] 0 0 0 0 0 ...
## ..$ airp_runway : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ runway : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_reclaimed_permanent : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ urban_industrial : num [1:3100] 0.291 0 0 0 0 ...
## ..$ lagoon : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ facility_unknown : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ residence_clearing : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_cased : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_unpaved_2l : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_3l : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ surrounding_veg : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rlwy_sgl_track : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_winter : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ sump : num [1:3100] 0 0 0 0 0 ...
## ..$ greenspace : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_2l : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_other : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ canal : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ reservoir : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_cleared_not_confirmed : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ misc_oil_gas_facility : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ camp_industrial : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_camp_industrial : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ oil_gas_plant : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_unknown : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_utilities : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ cfo : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ recreation : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ campground : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ peat : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ golfcourse : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ landfill : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ transfer_station : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ mill : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_div : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rlwy_spur : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_cleared_not_drilled : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ open_pit_mine : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_oil : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_4l : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ mines_pitlake : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_reclaimed_certified : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ris_windrow : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ tailing_pond : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
## .. [list output truncated]
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. .default = col_number(),
## .. .. array = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. .. camera = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. .. site = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. .. buff_dist = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. .. vegetated_edge_roads = col_number(),
## .. .. harvest_area = col_number(),
## .. .. road_gravel_1l = col_number(),
## .. .. conventional_seismic = col_number(),
## .. .. tame_pasture = col_number(),
## .. .. pipeline = col_number(),
## .. .. road_gravel_2l = col_number(),
## .. .. trail = col_number(),
## .. .. well_bitumen = col_number(),
## .. .. rough_pasture = col_number(),
## .. .. well_aband = col_number(),
## .. .. road_unclassified = col_number(),
## .. .. crop = col_number(),
## .. .. low_impact_seismic = col_number(),
## .. .. clearing_unknown = col_number(),
## .. .. cultivation_abandoned = col_number(),
## .. .. road_paved_undiv_2l = col_number(),
## .. .. road_unimproved = col_number(),
## .. .. truck_trail = col_number(),
## .. .. dugout = col_number(),
## .. .. road_paved_undiv_1l = col_number(),
## .. .. well_gas = col_number(),
## .. .. vegetated_edge_railways = col_number(),
## .. .. harvest_area_white_zone = col_number(),
## .. .. country_residence = col_number(),
## .. .. borrowpit_dry = col_number(),
## .. .. rural_residence = col_number(),
## .. .. borrowpit_wet = col_number(),
## .. .. borrowpits = col_number(),
## .. .. grvl_sand_pit = col_number(),
## .. .. ris_reclaimed_temp = col_number(),
## .. .. ris_clearing_unknown = col_number(),
## .. .. ris_drainage = col_number(),
## .. .. ris_mines_oilsands = col_number(),
## .. .. ris_overburden_dump = col_number(),
## .. .. ris_facility_operations = col_number(),
## .. .. transmission_line = col_number(),
## .. .. ris_tailing_pond = col_number(),
## .. .. clearing_wellpad_unconfirmed = col_number(),
## .. .. mines_oilsands = col_number(),
## .. .. ris_soil_replaced = col_number(),
## .. .. road_paved_1l = col_number(),
## .. .. ris_oilsands_rms = col_number(),
## .. .. ris_facility_unknown = col_number(),
## .. .. ris_borrowpits = col_number(),
## .. .. ris_transmission_line = col_number(),
## .. .. ris_soil_salvaged = col_number(),
## .. .. ris_road = col_number(),
## .. .. ris_plant = col_number(),
## .. .. urban_residence = col_number(),
## .. .. facility_other = col_number(),
## .. .. airp_runway = col_number(),
## .. .. runway = col_number(),
## .. .. ris_reclaimed_permanent = col_number(),
## .. .. urban_industrial = col_number(),
## .. .. lagoon = col_number(),
## .. .. facility_unknown = col_number(),
## .. .. residence_clearing = col_number(),
## .. .. well_cased = col_number(),
## .. .. road_unpaved_2l = col_number(),
## .. .. road_paved_3l = col_number(),
## .. .. surrounding_veg = col_number(),
## .. .. rlwy_sgl_track = col_number(),
## .. .. road_winter = col_number(),
## .. .. sump = col_number(),
## .. .. greenspace = col_number(),
## .. .. road_paved_2l = col_number(),
## .. .. well_other = col_number(),
## .. .. canal = col_number(),
## .. .. reservoir = col_number(),
## .. .. well_cleared_not_confirmed = col_number(),
## .. .. misc_oil_gas_facility = col_number(),
## .. .. camp_industrial = col_number(),
## .. .. ris_camp_industrial = col_number(),
## .. .. oil_gas_plant = col_number(),
## .. .. well_unknown = col_number(),
## .. .. ris_utilities = col_number(),
## .. .. cfo = col_number(),
## .. .. recreation = col_number(),
## .. .. campground = col_number(),
## .. .. peat = col_number(),
## .. .. golfcourse = col_number(),
## .. .. landfill = col_number(),
## .. .. transfer_station = col_number(),
## .. .. mill = col_number(),
## .. .. road_paved_div = col_number(),
## .. .. rlwy_spur = col_number(),
## .. .. well_cleared_not_drilled = col_number(),
## .. .. open_pit_mine = col_number(),
## .. .. well_oil = col_number(),
## .. .. road_paved_4l = col_number(),
## .. .. mines_pitlake = col_number(),
## .. .. ris_reclaimed_certified = col_number(),
## .. .. ris_windrow = col_number(),
## .. .. tailing_pond = col_number(),
## .. .. rlwy_mlt_track = col_number(),
## .. .. rlwy_dbl_track = col_number(),
## .. .. ris_waste = col_number(),
## .. .. interchange_ramp = col_number(),
## .. .. road_paved_5l = col_number(),
## .. .. ris_airp_runway = col_number(),
## .. .. fruit_vegetables = col_number(),
## .. .. road_unpaved_1l = col_number(),
## .. .. ris_reclaim_ready = col_number(),
## .. .. ris_tank_farm = col_number(),
## .. .. lc_class20 = col_number(),
## .. .. lc_class32 = col_number(),
## .. .. lc_class33 = col_number(),
## .. .. lc_class34 = col_number(),
## .. .. lc_class50 = col_number(),
## .. .. lc_class110 = col_number(),
## .. .. lc_class120 = col_number(),
## .. .. lc_class210 = col_number(),
## .. .. lc_class220 = col_number(),
## .. .. lc_class230 = col_number()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ covs_2021: spc_tbl_ [1,560 Ă— 80] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ array : Factor w/ 2 levels "LU2","LU3": 1 1 1 1 1 1 1 1 1 1 ...
## ..$ camera : Factor w/ 58 levels "03","05","100",..: 1 2 3 4 5 6 7 8 9 10 ...
## ..$ site : Factor w/ 78 levels "LU2_03","LU2_05",..: 1 2 3 4 5 6 7 8 9 10 ...
## ..$ buff_dist : Factor w/ 20 levels "250","500","750",..: 1 1 1 1 1 1 1 1 1 1 ...
## ..$ pipeline : num [1:1560] 0 0 0.0483 0 0.0218 ...
## ..$ harvest_area : num [1:1560] 0 0 0.0267 0 0 ...
## ..$ misc_oil_gas_facility : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ transmission_line : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ conventional_seismic : num [1:1560] 0.04091 0.00833 0.00259 0 0.00439 ...
## ..$ low_impact_seismic : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_aband : num [1:1560] 0.0203 0 0 0 0 ...
## ..$ well_gas : num [1:1560] 0 0 0 0 0.0391 ...
## ..$ well_other : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_bitumen : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ clearing_unknown : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ open_pit_mine : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ vegetated_edge_roads : num [1:1560] 0.000958 0.022859 0.072033 0.021681 0.029158 ...
## ..$ road_paved_undiv_2l : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_gravel_1l : num [1:1560] 0 0.0227 0.0215 0.0216 0.0125 ...
## ..$ road_unimproved : num [1:1560] 0 0 0 0 0.00742 ...
## ..$ harvest_area_white_zone : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ trail : num [1:1560] 0 0.012877 0 0.000893 0 ...
## ..$ crop : num [1:1560] 0 0 0 0 0.000715 ...
## ..$ rough_pasture : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ tame_pasture : num [1:1560] 0 0 0 0 0.0153 ...
## ..$ rural_residence : num [1:1560] 0 0 0 0 0.00346 ...
## ..$ urban_residence : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ greenspace : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ recreation : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ runway : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_cased : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ facility_unknown : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ urban_industrial : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ clearing_wellpad_unconfirmed: num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ grvl_sand_pit : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ vegetated_edge_railways : num [1:1560] 0 0 0 0 0.127 ...
## ..$ road_unclassified : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ borrowpit_wet : num [1:1560] 0 0 0 0 0 ...
## ..$ borrowpit_dry : num [1:1560] 0 0 0 0 0 ...
## ..$ borrowpits : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ residence_clearing : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ campground : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_cleared_not_confirmed : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ camp_industrial : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ oil_gas_plant : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ truck_trail : num [1:1560] 0.000815 0 0 0 0 ...
## ..$ road_gravel_2l : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_undiv_1l : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ sump : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ dugout : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ country_residence : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ mill : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_2l : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ facility_other : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ surrounding_veg : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rlwy_sgl_track : num [1:1560] 0 0 0 0 0.0244 ...
## ..$ well_cleared_not_drilled : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ well_unknown : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ cultivation_abandoned : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ golfcourse : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ airp_runway : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ lagoon : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ reservoir : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ transfer_station : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ landfill : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ mines_pitlake : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ rlwy_spur : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ road_paved_1l : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ canal : num [1:1560] 0 0 0 0 0.0196 ...
## ..$ gridcll : num [1:1560] 2 2 2 2 2 2 2 2 2 2 ...
## ..$ lab : num [1:1560] NA NA NA NA NA NA NA NA NA NA ...
## ..$ lc_class20 : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ lc_class33 : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ lc_class34 : num [1:1560] 0 0.19 0.212 0.163 0.366 ...
## ..$ lc_class50 : num [1:1560] 0 0.171 0 0 0 ...
## ..$ lc_class110 : num [1:1560] 0 0 0.214 0 0.406 ...
## ..$ lc_class120 : num [1:1560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ lc_class210 : num [1:1560] 0.1584 0.0821 0.0307 0.2237 0.1831 ...
## ..$ lc_class220 : num [1:1560] 0.838 0.134 0 0.613 0 ...
## ..$ lc_class230 : num [1:1560] 0.00411 0.42255 0.54321 0 0.0455 ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. .default = col_number(),
## .. .. array = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. .. camera = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. .. site = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. .. buff_dist = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. .. pipeline = col_number(),
## .. .. harvest_area = col_number(),
## .. .. misc_oil_gas_facility = col_number(),
## .. .. transmission_line = col_number(),
## .. .. conventional_seismic = col_number(),
## .. .. low_impact_seismic = col_number(),
## .. .. well_aband = col_number(),
## .. .. well_gas = col_number(),
## .. .. well_other = col_number(),
## .. .. well_bitumen = col_number(),
## .. .. clearing_unknown = col_number(),
## .. .. open_pit_mine = col_number(),
## .. .. vegetated_edge_roads = col_number(),
## .. .. road_paved_undiv_2l = col_number(),
## .. .. road_gravel_1l = col_number(),
## .. .. road_unimproved = col_number(),
## .. .. harvest_area_white_zone = col_number(),
## .. .. trail = col_number(),
## .. .. crop = col_number(),
## .. .. rough_pasture = col_number(),
## .. .. tame_pasture = col_number(),
## .. .. rural_residence = col_number(),
## .. .. urban_residence = col_number(),
## .. .. greenspace = col_number(),
## .. .. recreation = col_number(),
## .. .. runway = col_number(),
## .. .. well_cased = col_number(),
## .. .. facility_unknown = col_number(),
## .. .. urban_industrial = col_number(),
## .. .. clearing_wellpad_unconfirmed = col_number(),
## .. .. grvl_sand_pit = col_number(),
## .. .. vegetated_edge_railways = col_number(),
## .. .. road_unclassified = col_number(),
## .. .. borrowpit_wet = col_number(),
## .. .. borrowpit_dry = col_number(),
## .. .. borrowpits = col_number(),
## .. .. residence_clearing = col_number(),
## .. .. campground = col_number(),
## .. .. well_cleared_not_confirmed = col_number(),
## .. .. camp_industrial = col_number(),
## .. .. oil_gas_plant = col_number(),
## .. .. truck_trail = col_number(),
## .. .. road_gravel_2l = col_number(),
## .. .. road_paved_undiv_1l = col_number(),
## .. .. sump = col_number(),
## .. .. dugout = col_number(),
## .. .. country_residence = col_number(),
## .. .. mill = col_number(),
## .. .. road_paved_2l = col_number(),
## .. .. facility_other = col_number(),
## .. .. surrounding_veg = col_number(),
## .. .. rlwy_sgl_track = col_number(),
## .. .. well_cleared_not_drilled = col_number(),
## .. .. well_unknown = col_number(),
## .. .. cultivation_abandoned = col_number(),
## .. .. golfcourse = col_number(),
## .. .. airp_runway = col_number(),
## .. .. lagoon = col_number(),
## .. .. reservoir = col_number(),
## .. .. transfer_station = col_number(),
## .. .. landfill = col_number(),
## .. .. mines_pitlake = col_number(),
## .. .. rlwy_spur = col_number(),
## .. .. road_paved_1l = col_number(),
## .. .. canal = col_number(),
## .. .. gridcll = col_number(),
## .. .. lab = col_number(),
## .. .. lc_class20 = col_number(),
## .. .. lc_class33 = col_number(),
## .. .. lc_class34 = col_number(),
## .. .. lc_class50 = col_number(),
## .. .. lc_class110 = col_number(),
## .. .. lc_class120 = col_number(),
## .. .. lc_class210 = col_number(),
## .. .. lc_class220 = col_number(),
## .. .. lc_class230 = col_number()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
You may get a warning about parsing issues, don’t panic this is fine.
We want one singular covariate data frame, not two list elements with separate data frames as we have now. So we need to join the two data frames. We’ve done our best to ensure these are formatted similarly but unfortunately they still don’t have the exact same number of columns so they won’t rbind nicely with the base R function.
This is likely to be the case each year, but we can use the
dplyr function bind_rows() which will rbind any
rows where the columns match and will fill any rows where there are
extra columns with NAs.
covariates_merged <- dplyr::bind_rows(covariates$covs_2022,
covariates$covs_2021)
head(covariates_merged)
## # A tibble: 6 Ă— 121
## array camera site buff_dist vegetated_edge_roads harvest_area road_gravel_1l
## <fct> <fct> <fct> <fct> <dbl> <dbl> <dbl>
## 1 LU13 18 LU13_… 250 0 0 0
## 2 LU13 15 LU13_… 250 0.0858 0 0
## 3 LU13 03 LU13_… 250 0 0.687 0
## 4 LU13 34 LU13_… 250 0 0.337 0
## 5 LU13 57 LU13_… 250 0 0 0
## 6 LU13 16 LU13_… 250 0 0 0
## # ℹ 114 more variables: conventional_seismic <dbl>, tame_pasture <dbl>,
## # pipeline <dbl>, road_gravel_2l <dbl>, trail <dbl>, well_bitumen <dbl>,
## # rough_pasture <dbl>, well_aband <dbl>, road_unclassified <dbl>, crop <dbl>,
## # low_impact_seismic <dbl>, clearing_unknown <dbl>,
## # cultivation_abandoned <dbl>, road_paved_undiv_2l <dbl>,
## # road_unimproved <dbl>, truck_trail <dbl>, dugout <dbl>,
## # road_paved_undiv_1l <dbl>, well_gas <dbl>, vegetated_edge_railways <dbl>, …
Let’s check over this data to make sure the bind worked how we expected it to.
While we specified how the columns should read in when we imported the data, this could change during the merge or from year-to-year so let’s double check the data structure now that all 6 LUs are in one data frame.
We can also check that all the LUs are indeed in the data and all the sites. We should have 6 LUs and 233 sites (155 from 2022-2023 and 78 from 2021-2022)
str(covariates_merged)
## spc_tbl_ [4,660 Ă— 121] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ array : Factor w/ 6 levels "LU13","LU15",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ camera : Factor w/ 111 levels "18","15","03",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ site : Factor w/ 233 levels "LU13_18","LU13_15",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ buff_dist : Factor w/ 20 levels "250","500","750",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ vegetated_edge_roads : num [1:4660] 0 0.0858 0 0 0 ...
## $ harvest_area : num [1:4660] 0 0 0.687 0.337 0 ...
## $ road_gravel_1l : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ conventional_seismic : num [1:4660] 0 0.03277 0 0.00889 0.01144 ...
## $ tame_pasture : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ pipeline : num [1:4660] 0 0.068 0 0 0.0301 ...
## $ road_gravel_2l : num [1:4660] 0 0 0 0 0 ...
## $ trail : num [1:4660] 0.00588 0.0028 0 0.00196 0 ...
## $ well_bitumen : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ rough_pasture : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_aband : num [1:4660] 0 0 0 0 0.0322 ...
## $ road_unclassified : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ crop : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ low_impact_seismic : num [1:4660] 0 0 0 0 0.0523 ...
## $ clearing_unknown : num [1:4660] 0.0923 0.0697 0 0 0 ...
## $ cultivation_abandoned : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_paved_undiv_2l : num [1:4660] 0 0.0174 0 0 0 ...
## $ road_unimproved : num [1:4660] 0 0 0 0 0 ...
## $ truck_trail : num [1:4660] 0 0 0 0.0139 0 ...
## $ dugout : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_paved_undiv_1l : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_gas : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ vegetated_edge_railways : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ harvest_area_white_zone : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ country_residence : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ borrowpit_dry : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ rural_residence : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ borrowpit_wet : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ borrowpits : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ grvl_sand_pit : num [1:4660] 0 0.0873 0 0 0 ...
## $ ris_reclaimed_temp : num [1:4660] 0 0.0477 0 0 0 ...
## $ ris_clearing_unknown : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_drainage : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_mines_oilsands : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_overburden_dump : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_facility_operations : num [1:4660] 0 0 0 0 0 ...
## $ transmission_line : num [1:4660] 0.0642 0 0 0 0.091 ...
## $ ris_tailing_pond : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ clearing_wellpad_unconfirmed: num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ mines_oilsands : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_soil_replaced : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_paved_1l : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_oilsands_rms : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_facility_unknown : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_borrowpits : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_transmission_line : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_soil_salvaged : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_road : num [1:4660] 0 0 0 0 0 ...
## $ ris_plant : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ urban_residence : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ facility_other : num [1:4660] 0 0 0 0 0 ...
## $ airp_runway : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ runway : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_reclaimed_permanent : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ urban_industrial : num [1:4660] 0.291 0 0 0 0 ...
## $ lagoon : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ facility_unknown : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ residence_clearing : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_cased : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_unpaved_2l : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_paved_3l : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ surrounding_veg : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ rlwy_sgl_track : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_winter : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ sump : num [1:4660] 0 0 0 0 0 ...
## $ greenspace : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_paved_2l : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_other : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ canal : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ reservoir : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_cleared_not_confirmed : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ misc_oil_gas_facility : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ camp_industrial : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_camp_industrial : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ oil_gas_plant : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_unknown : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_utilities : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ cfo : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ recreation : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ campground : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ peat : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ golfcourse : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ landfill : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ transfer_station : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ mill : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_paved_div : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ rlwy_spur : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_cleared_not_drilled : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ open_pit_mine : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ well_oil : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ road_paved_4l : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ mines_pitlake : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_reclaimed_certified : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ ris_windrow : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ tailing_pond : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## [list output truncated]
## - attr(*, "spec")=
## .. cols(
## .. .default = col_number(),
## .. array = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. camera = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. site = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. buff_dist = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. vegetated_edge_roads = col_number(),
## .. harvest_area = col_number(),
## .. road_gravel_1l = col_number(),
## .. conventional_seismic = col_number(),
## .. tame_pasture = col_number(),
## .. pipeline = col_number(),
## .. road_gravel_2l = col_number(),
## .. trail = col_number(),
## .. well_bitumen = col_number(),
## .. rough_pasture = col_number(),
## .. well_aband = col_number(),
## .. road_unclassified = col_number(),
## .. crop = col_number(),
## .. low_impact_seismic = col_number(),
## .. clearing_unknown = col_number(),
## .. cultivation_abandoned = col_number(),
## .. road_paved_undiv_2l = col_number(),
## .. road_unimproved = col_number(),
## .. truck_trail = col_number(),
## .. dugout = col_number(),
## .. road_paved_undiv_1l = col_number(),
## .. well_gas = col_number(),
## .. vegetated_edge_railways = col_number(),
## .. harvest_area_white_zone = col_number(),
## .. country_residence = col_number(),
## .. borrowpit_dry = col_number(),
## .. rural_residence = col_number(),
## .. borrowpit_wet = col_number(),
## .. borrowpits = col_number(),
## .. grvl_sand_pit = col_number(),
## .. ris_reclaimed_temp = col_number(),
## .. ris_clearing_unknown = col_number(),
## .. ris_drainage = col_number(),
## .. ris_mines_oilsands = col_number(),
## .. ris_overburden_dump = col_number(),
## .. ris_facility_operations = col_number(),
## .. transmission_line = col_number(),
## .. ris_tailing_pond = col_number(),
## .. clearing_wellpad_unconfirmed = col_number(),
## .. mines_oilsands = col_number(),
## .. ris_soil_replaced = col_number(),
## .. road_paved_1l = col_number(),
## .. ris_oilsands_rms = col_number(),
## .. ris_facility_unknown = col_number(),
## .. ris_borrowpits = col_number(),
## .. ris_transmission_line = col_number(),
## .. ris_soil_salvaged = col_number(),
## .. ris_road = col_number(),
## .. ris_plant = col_number(),
## .. urban_residence = col_number(),
## .. facility_other = col_number(),
## .. airp_runway = col_number(),
## .. runway = col_number(),
## .. ris_reclaimed_permanent = col_number(),
## .. urban_industrial = col_number(),
## .. lagoon = col_number(),
## .. facility_unknown = col_number(),
## .. residence_clearing = col_number(),
## .. well_cased = col_number(),
## .. road_unpaved_2l = col_number(),
## .. road_paved_3l = col_number(),
## .. surrounding_veg = col_number(),
## .. rlwy_sgl_track = col_number(),
## .. road_winter = col_number(),
## .. sump = col_number(),
## .. greenspace = col_number(),
## .. road_paved_2l = col_number(),
## .. well_other = col_number(),
## .. canal = col_number(),
## .. reservoir = col_number(),
## .. well_cleared_not_confirmed = col_number(),
## .. misc_oil_gas_facility = col_number(),
## .. camp_industrial = col_number(),
## .. ris_camp_industrial = col_number(),
## .. oil_gas_plant = col_number(),
## .. well_unknown = col_number(),
## .. ris_utilities = col_number(),
## .. cfo = col_number(),
## .. recreation = col_number(),
## .. campground = col_number(),
## .. peat = col_number(),
## .. golfcourse = col_number(),
## .. landfill = col_number(),
## .. transfer_station = col_number(),
## .. mill = col_number(),
## .. road_paved_div = col_number(),
## .. rlwy_spur = col_number(),
## .. well_cleared_not_drilled = col_number(),
## .. open_pit_mine = col_number(),
## .. well_oil = col_number(),
## .. road_paved_4l = col_number(),
## .. mines_pitlake = col_number(),
## .. ris_reclaimed_certified = col_number(),
## .. ris_windrow = col_number(),
## .. tailing_pond = col_number(),
## .. rlwy_mlt_track = col_number(),
## .. rlwy_dbl_track = col_number(),
## .. ris_waste = col_number(),
## .. interchange_ramp = col_number(),
## .. road_paved_5l = col_number(),
## .. ris_airp_runway = col_number(),
## .. fruit_vegetables = col_number(),
## .. road_unpaved_1l = col_number(),
## .. ris_reclaim_ready = col_number(),
## .. ris_tank_farm = col_number(),
## .. lc_class20 = col_number(),
## .. lc_class32 = col_number(),
## .. lc_class33 = col_number(),
## .. lc_class34 = col_number(),
## .. lc_class50 = col_number(),
## .. lc_class110 = col_number(),
## .. lc_class120 = col_number(),
## .. lc_class210 = col_number(),
## .. lc_class220 = col_number(),
## .. lc_class230 = col_number()
## .. )
## - attr(*, "problems")=<externalptr>
Looks like everything read in correctly, I don’t see any missing columns (we won’t need the lab or gridcll column which we can deselect later), and all the arrays (LUs) and sites are accounted for.
Let’s check the data summary now, we might have NAs for some of the HFI features but otherwise shouldn’t have any for the other variables.
summary(covariates_merged)
## array camera site buff_dist vegetated_edge_roads
## LU13:820 27 : 120 LU13_18: 20 250 : 233 Min. :0.000000
## LU15:780 32 : 120 LU13_15: 20 500 : 233 1st Qu.:0.002604
## LU21:720 36 : 120 LU13_03: 20 750 : 233 Median :0.006764
## LU01:780 21 : 100 LU13_34: 20 1000 : 233 Mean :0.010682
## LU2 :840 41 : 100 LU13_57: 20 1250 : 233 3rd Qu.:0.013869
## LU3 :720 18 : 80 LU13_16: 20 1500 : 233 Max. :0.147883
## (Other):4020 (Other):4540 (Other):3262
## harvest_area road_gravel_1l conventional_seismic tame_pasture
## Min. :0.00000 Min. :0.000000 Min. :0.000000 Min. :0.0000000
## 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.003485 1st Qu.:0.0000000
## Median :0.00000 Median :0.001385 Median :0.006323 Median :0.0000000
## Mean :0.04720 Mean :0.002913 Mean :0.006592 Mean :0.0008195
## 3rd Qu.:0.03969 3rd Qu.:0.003689 3rd Qu.:0.009171 3rd Qu.:0.0000000
## Max. :0.83674 Max. :0.038085 Max. :0.045512 Max. :0.1636895
##
## pipeline road_gravel_2l trail well_bitumen
## Min. :0.00000 Min. :0.0000000 Min. :0.0000000 Min. :0.000000
## 1st Qu.:0.00000 1st Qu.:0.0000000 1st Qu.:0.0001209 1st Qu.:0.000000
## Median :0.01158 Median :0.0000000 Median :0.0007039 Median :0.000000
## Mean :0.01810 Mean :0.0011075 Mean :0.0010490 Mean :0.006039
## 3rd Qu.:0.02619 3rd Qu.:0.0004745 3rd Qu.:0.0015517 3rd Qu.:0.005144
## Max. :0.28896 Max. :0.0438815 Max. :0.0197691 Max. :0.187398
##
## rough_pasture well_aband road_unclassified
## Min. :0.0000000 Min. :0.0000000 Min. :0.000e+00
## 1st Qu.:0.0000000 1st Qu.:0.0003367 1st Qu.:0.000e+00
## Median :0.0000000 Median :0.0019160 Median :0.000e+00
## Mean :0.0002038 Mean :0.0058542 Mean :4.093e-06
## 3rd Qu.:0.0000000 3rd Qu.:0.0093228 3rd Qu.:0.000e+00
## Max. :0.0828324 Max. :0.3045402 Max. :8.613e-04
##
## crop low_impact_seismic clearing_unknown
## Min. :0.000e+00 Min. :0.000000 Min. :0.0000000
## 1st Qu.:0.000e+00 1st Qu.:0.000000 1st Qu.:0.0000000
## Median :0.000e+00 Median :0.000000 Median :0.0001542
## Mean :1.469e-06 Mean :0.005522 Mean :0.0044589
## 3rd Qu.:0.000e+00 3rd Qu.:0.004557 3rd Qu.:0.0026457
## Max. :2.571e-03 Max. :0.087576 Max. :0.4023522
##
## cultivation_abandoned road_paved_undiv_2l road_unimproved
## Min. :0.000e+00 Min. :0.0000000 Min. :0.0000000
## 1st Qu.:0.000e+00 1st Qu.:0.0000000 1st Qu.:0.0000000
## Median :0.000e+00 Median :0.0000000 Median :0.0003318
## Mean :2.547e-05 Mean :0.0005082 Mean :0.0016662
## 3rd Qu.:0.000e+00 3rd Qu.:0.0000000 3rd Qu.:0.0018760
## Max. :3.115e-02 Max. :0.0431664 Max. :0.0532898
##
## truck_trail dugout road_paved_undiv_1l well_gas
## Min. :0.000000 Min. :0.000e+00 Min. :0.000e+00 Min. :0.0000000
## 1st Qu.:0.000000 1st Qu.:0.000e+00 1st Qu.:0.000e+00 1st Qu.:0.0000000
## Median :0.000000 Median :0.000e+00 Median :0.000e+00 Median :0.0000000
## Mean :0.000609 Mean :3.480e-06 Mean :7.514e-05 Mean :0.0003188
## 3rd Qu.:0.000398 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00 3rd Qu.:0.0001151
## Max. :0.038651 Max. :1.825e-03 Max. :2.147e-02 Max. :0.0572117
##
## vegetated_edge_railways harvest_area_white_zone country_residence
## Min. :0.000e+00 Min. :0.0000000 Min. :0.0000000
## 1st Qu.:0.000e+00 1st Qu.:0.0000000 1st Qu.:0.0000000
## Median :0.000e+00 Median :0.0000000 Median :0.0000000
## Mean :8.976e-05 Mean :0.0002387 Mean :0.0000608
## 3rd Qu.:0.000e+00 3rd Qu.:0.0000000 3rd Qu.:0.0000000
## Max. :1.271e-01 Max. :0.0543438 Max. :0.0171405
##
## borrowpit_dry rural_residence borrowpit_wet borrowpits
## Min. :0.0000000 Min. :0.000e+00 Min. :0.000000 Min. :0.0000000
## 1st Qu.:0.0000000 1st Qu.:0.000e+00 1st Qu.:0.000000 1st Qu.:0.0000000
## Median :0.0000000 Median :0.000e+00 Median :0.000000 Median :0.0000000
## Mean :0.0009134 Mean :5.307e-05 Mean :0.000642 Mean :0.0003201
## 3rd Qu.:0.0003956 3rd Qu.:0.000e+00 3rd Qu.:0.000000 3rd Qu.:0.0000000
## Max. :0.1038665 Max. :2.805e-02 Max. :0.271759 Max. :0.1163709
##
## grvl_sand_pit ris_reclaimed_temp ris_clearing_unknown ris_drainage
## Min. :0.000000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.000000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.000000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.001888 Mean :0.0002 Mean :0.0004 Mean :0.0001
## 3rd Qu.:0.000000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :0.557858 Max. :0.0477 Max. :0.0494 Max. :0.0168
## NA's :1560 NA's :1560 NA's :1560
## ris_mines_oilsands ris_overburden_dump ris_facility_operations
## Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.0001 Mean :0.0001 Mean :0.0004
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :0.0567 Max. :0.0211 Max. :0.1274
## NA's :1560 NA's :1560 NA's :1560
## transmission_line ris_tailing_pond clearing_wellpad_unconfirmed
## Min. :0.000000 Min. :0.0000 Min. :0.0000000
## 1st Qu.:0.000000 1st Qu.:0.0000 1st Qu.:0.0000000
## Median :0.000000 Median :0.0000 Median :0.0000000
## Mean :0.004601 Mean :0.0012 Mean :0.0003592
## 3rd Qu.:0.004977 3rd Qu.:0.0000 3rd Qu.:0.0003713
## Max. :0.173950 Max. :0.1738 Max. :0.0723607
## NA's :1560
## mines_oilsands ris_soil_replaced road_paved_1l ris_oilsands_rms
## Min. :0.0000 Min. :0.0000 Min. :0 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0 Median :0.0000
## Mean :0.0009 Mean :0.0002 Mean :0 Mean :0.0002
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0 3rd Qu.:0.0000
## Max. :0.1223 Max. :0.0245 Max. :0 Max. :0.0335
## NA's :1560 NA's :1560 NA's :1560
## ris_facility_unknown ris_borrowpits ris_transmission_line ris_soil_salvaged
## Min. :0 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0 Mean :0.0000 Mean :0.0000 Mean :0.0001
## 3rd Qu.:0 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :0 Max. :0.0051 Max. :0.0027 Max. :0.0415
## NA's :1560 NA's :1560 NA's :1560 NA's :1560
## ris_road ris_plant urban_residence facility_other
## Min. :0.0000 Min. :0 Min. :0.000e+00 Min. :0.0000000
## 1st Qu.:0.0000 1st Qu.:0 1st Qu.:0.000e+00 1st Qu.:0.0000000
## Median :0.0000 Median :0 Median :0.000e+00 Median :0.0000000
## Mean :0.0002 Mean :0 Mean :4.099e-05 Mean :0.0007405
## 3rd Qu.:0.0000 3rd Qu.:0 3rd Qu.:0.000e+00 3rd Qu.:0.0000000
## Max. :0.0218 Max. :0 Max. :1.157e-02 Max. :0.2009920
## NA's :1560 NA's :1560
## airp_runway runway ris_reclaimed_permanent urban_industrial
## Min. :0 Min. :0.000e+00 Min. :0.0000 Min. :0.000000
## 1st Qu.:0 1st Qu.:0.000e+00 1st Qu.:0.0000 1st Qu.:0.000000
## Median :0 Median :0.000e+00 Median :0.0000 Median :0.000000
## Mean :0 Mean :3.529e-05 Mean :0.0006 Mean :0.001092
## 3rd Qu.:0 3rd Qu.:0.000e+00 3rd Qu.:0.0000 3rd Qu.:0.000000
## Max. :0 Max. :1.525e-02 Max. :0.0535 Max. :0.335749
## NA's :1560
## lagoon facility_unknown residence_clearing
## Min. :0.0000000 Min. :0.0000000 Min. :0.000e+00
## 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0.000e+00
## Median :0.0000000 Median :0.0000000 Median :0.000e+00
## Mean :0.0001343 Mean :0.0001777 Mean :7.892e-06
## 3rd Qu.:0.0000000 3rd Qu.:0.0000000 3rd Qu.:0.000e+00
## Max. :0.0218390 Max. :0.1379450 Max. :3.113e-03
##
## well_cased road_unpaved_2l road_paved_3l surrounding_veg
## Min. :0.0000000 Min. :0 Min. :0 Min. :0.000e+00
## 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0 1st Qu.:0.000e+00
## Median :0.0000000 Median :0 Median :0 Median :0.000e+00
## Mean :0.0005716 Mean :0 Mean :0 Mean :9.553e-05
## 3rd Qu.:0.0001940 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.000e+00
## Max. :0.0685807 Max. :0 Max. :0 Max. :8.209e-02
## NA's :1560 NA's :1560
## rlwy_sgl_track road_winter sump greenspace
## Min. :0.00e+00 Min. :0 Min. :0.000000 Min. :0.000e+00
## 1st Qu.:0.00e+00 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0.000e+00
## Median :0.00e+00 Median :0 Median :0.000000 Median :0.000e+00
## Mean :2.92e-05 Mean :0 Mean :0.002142 Mean :1.466e-05
## 3rd Qu.:0.00e+00 3rd Qu.:0 3rd Qu.:0.001785 3rd Qu.:0.000e+00
## Max. :2.44e-02 Max. :0 Max. :0.311103 Max. :3.028e-03
## NA's :1560
## road_paved_2l well_other canal reservoir
## Min. :0 Min. :0.000000 Min. :0.0000000 Min. :0.000e+00
## 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0.0000000 1st Qu.:0.000e+00
## Median :0 Median :0.000000 Median :0.0000000 Median :0.000e+00
## Mean :0 Mean :0.001548 Mean :0.0000167 Mean :8.339e-06
## 3rd Qu.:0 3rd Qu.:0.001006 3rd Qu.:0.0000000 3rd Qu.:0.000e+00
## Max. :0 Max. :0.116479 Max. :0.0196060 Max. :7.894e-03
##
## well_cleared_not_confirmed misc_oil_gas_facility camp_industrial
## Min. :0.0000000 Min. :0.0000000 Min. :0.0000000
## 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0.0000000
## Median :0.0000000 Median :0.0000000 Median :0.0000000
## Mean :0.0002716 Mean :0.0031465 Mean :0.0005956
## 3rd Qu.:0.0000000 3rd Qu.:0.0006169 3rd Qu.:0.0000000
## Max. :0.0829690 Max. :0.3449713 Max. :0.2450556
##
## ris_camp_industrial oil_gas_plant well_unknown ris_utilities
## Min. :0 Min. :0.000000 Min. :0.000e+00 Min. :0.0000
## 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0.000e+00 1st Qu.:0.0000
## Median :0 Median :0.000000 Median :0.000e+00 Median :0.0000
## Mean :0 Mean :0.001106 Mean :3.274e-05 Mean :0.0000
## 3rd Qu.:0 3rd Qu.:0.000000 3rd Qu.:0.000e+00 3rd Qu.:0.0000
## Max. :0 Max. :0.175037 Max. :4.813e-03 Max. :0.0025
## NA's :1560 NA's :1560
## cfo recreation campground peat golfcourse
## Min. :0.0000 Min. :0 Min. :0.000000 Min. :0 Min. :0
## 1st Qu.:0.0000 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0 1st Qu.:0
## Median :0.0000 Median :0 Median :0.000000 Median :0 Median :0
## Mean :0.0000 Mean :0 Mean :0.000103 Mean :0 Mean :0
## 3rd Qu.:0.0000 3rd Qu.:0 3rd Qu.:0.000000 3rd Qu.:0 3rd Qu.:0
## Max. :0.0012 Max. :0 Max. :0.028966 Max. :0 Max. :0
## NA's :1560 NA's :1560
## landfill transfer_station mill road_paved_div rlwy_spur
## Min. :0 Min. :0 Min. :0 Min. :0.0000 Min. :0
## 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0.0000 1st Qu.:0
## Median :0 Median :0 Median :0 Median :0.0000 Median :0
## Mean :0 Mean :0 Mean :0 Mean :0.0000 Mean :0
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.0000 3rd Qu.:0
## Max. :0 Max. :0 Max. :0 Max. :0.0019 Max. :0
## NA's :1560
## well_cleared_not_drilled open_pit_mine well_oil road_paved_4l
## Min. :0.000e+00 Min. :0.0000000 Min. :0 Min. :0
## 1st Qu.:0.000e+00 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0
## Median :0.000e+00 Median :0.0000000 Median :0 Median :0
## Mean :2.143e-05 Mean :0.0005218 Mean :0 Mean :0
## 3rd Qu.:0.000e+00 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:0
## Max. :1.469e-02 Max. :0.0641059 Max. :0 Max. :0
## NA's :1560 NA's :1560
## mines_pitlake ris_reclaimed_certified ris_windrow tailing_pond
## Min. :0 Min. :0 Min. :0.000 Min. :0.000
## 1st Qu.:0 1st Qu.:0 1st Qu.:0.000 1st Qu.:0.000
## Median :0 Median :0 Median :0.000 Median :0.000
## Mean :0 Mean :0 Mean :0.000 Mean :0.000
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.000 3rd Qu.:0.000
## Max. :0 Max. :0 Max. :0.016 Max. :0.004
## NA's :1560 NA's :1560 NA's :1560
## rlwy_mlt_track rlwy_dbl_track ris_waste interchange_ramp road_paved_5l
## Min. :0 Min. :0 Min. :0 Min. :0 Min. :0
## 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :0 Median :0 Median :0 Median :0 Median :0
## Mean :0 Mean :0 Mean :0 Mean :0 Mean :0
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :0 Max. :0 Max. :0 Max. :0 Max. :0
## NA's :1560 NA's :1560 NA's :1560 NA's :1560 NA's :1560
## ris_airp_runway fruit_vegetables road_unpaved_1l ris_reclaim_ready
## Min. :0 Min. :0 Min. :0 Min. :0
## 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :0 Median :0 Median :0 Median :0
## Mean :0 Mean :0 Mean :0 Mean :0
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :0 Max. :0 Max. :0 Max. :0
## NA's :1560 NA's :1560 NA's :1560 NA's :1560
## ris_tank_farm lc_class20 lc_class32 lc_class33
## Min. :0 Min. :0.000000 Min. :0.0000 Min. :0.000000
## 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0.0000 1st Qu.:0.000000
## Median :0 Median :0.002998 Median :0.0000 Median :0.000000
## Mean :0 Mean :0.028712 Mean :0.0000 Mean :0.003658
## 3rd Qu.:0 3rd Qu.:0.032240 3rd Qu.:0.0000 3rd Qu.:0.000000
## Max. :0 Max. :0.519648 Max. :0.0118 Max. :0.324028
## NA's :1560 NA's :1560
## lc_class34 lc_class50 lc_class110 lc_class120
## Min. :0.00000 Min. :0.00000 Min. :0.000000 Min. :0.0000000
## 1st Qu.:0.00000 1st Qu.:0.01039 1st Qu.:0.009894 1st Qu.:0.0000000
## Median :0.01762 Median :0.04435 Median :0.036593 Median :0.0000000
## Mean :0.03201 Mean :0.09125 Mean :0.046553 Mean :0.0008668
## 3rd Qu.:0.04129 3rd Qu.:0.10789 3rd Qu.:0.061983 3rd Qu.:0.0000000
## Max. :0.55710 Max. :1.00000 Max. :0.731887 Max. :0.1654590
##
## lc_class210 lc_class220 lc_class230 gridcll
## Min. :0.0000 Min. :0.000000 Min. :0.00000 Min. :2.000
## 1st Qu.:0.2869 1st Qu.:0.008999 1st Qu.:0.02228 1st Qu.:2.000
## Median :0.5748 Median :0.071814 Median :0.06057 Median :2.000
## Mean :0.5346 Mean :0.140469 Mean :0.12188 Mean :2.462
## 3rd Qu.:0.7873 3rd Qu.:0.230000 3rd Qu.:0.17370 3rd Qu.:3.000
## Max. :1.0000 Max. :1.000000 Max. :0.93217 Max. :3.000
## NA's :3100
## lab
## Min. : NA
## 1st Qu.: NA
## Median : NA
## Mean :NaN
## 3rd Qu.: NA
## Max. : NA
## NA's :4660
This looks good, we will want to replace the NAs with zeros during data formatting because the only reason we have NAs is because there weren’t any of those features in the other data file, and since these calculate proportions of each feature that would make the proportion zero, and we don’t want to lose the other data for those sites.
This section will need to be altered year-to-year to accommodate various issues that are unique to each year, but offers a good starting point.
I like to do as much of my data manipulation I can in one dplyr pipe (i.e. code chunk) to avoid extra coding and assigning intermediate objects to the environment that I don’t need, but if this format doesn’t make sense to you, each step can be done individually if you pull the code out of the pipeline and reference the data within each function. I do write each step individually and check that it’s working correctly as I go.
In the code chunk below I,
Then we run summary to check that everything worked. (If you have other formatting to do you may need to use other functions to check that everything worked)
covariates_fixed <- covariates_merged %>%
# remove columns we won't use anymore
select(!c(camera,
gridcll,
lab)) %>%
# order columns alphabetically
select(order(colnames(.))) %>%
# we want to move the columns that aren't HFI features or landcover to the front
relocate(.,
c(array,
site,
buff_dist)) %>%
# replace NAs introduced from joining data to zeros
replace(is.na(.),
0)
# check that everything looks good
summary(covariates_fixed)
## array site buff_dist airp_runway borrowpit_dry
## LU13:820 LU13_18: 20 250 : 233 Min. :0 Min. :0.0000000
## LU15:780 LU13_15: 20 500 : 233 1st Qu.:0 1st Qu.:0.0000000
## LU21:720 LU13_03: 20 750 : 233 Median :0 Median :0.0000000
## LU01:780 LU13_34: 20 1000 : 233 Mean :0 Mean :0.0009134
## LU2 :840 LU13_57: 20 1250 : 233 3rd Qu.:0 3rd Qu.:0.0003956
## LU3 :720 LU13_16: 20 1500 : 233 Max. :0 Max. :0.1038665
## (Other):4540 (Other):3262
## borrowpit_wet borrowpits camp_industrial campground
## Min. :0.000000 Min. :0.0000000 Min. :0.0000000 Min. :0.000000
## 1st Qu.:0.000000 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0.000000
## Median :0.000000 Median :0.0000000 Median :0.0000000 Median :0.000000
## Mean :0.000642 Mean :0.0003201 Mean :0.0005956 Mean :0.000103
## 3rd Qu.:0.000000 3rd Qu.:0.0000000 3rd Qu.:0.0000000 3rd Qu.:0.000000
## Max. :0.271759 Max. :0.1163709 Max. :0.2450556 Max. :0.028966
##
## canal cfo clearing_unknown
## Min. :0.0000000 Min. :0.000e+00 Min. :0.0000000
## 1st Qu.:0.0000000 1st Qu.:0.000e+00 1st Qu.:0.0000000
## Median :0.0000000 Median :0.000e+00 Median :0.0001542
## Mean :0.0000167 Mean :5.398e-07 Mean :0.0044589
## 3rd Qu.:0.0000000 3rd Qu.:0.000e+00 3rd Qu.:0.0026457
## Max. :0.0196060 Max. :1.217e-03 Max. :0.4023522
##
## clearing_wellpad_unconfirmed conventional_seismic country_residence
## Min. :0.0000000 Min. :0.000000 Min. :0.0000000
## 1st Qu.:0.0000000 1st Qu.:0.003485 1st Qu.:0.0000000
## Median :0.0000000 Median :0.006323 Median :0.0000000
## Mean :0.0003592 Mean :0.006592 Mean :0.0000608
## 3rd Qu.:0.0003713 3rd Qu.:0.009171 3rd Qu.:0.0000000
## Max. :0.0723607 Max. :0.045512 Max. :0.0171405
##
## crop cultivation_abandoned dugout
## Min. :0.000e+00 Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:0.000e+00 1st Qu.:0.000e+00 1st Qu.:0.000e+00
## Median :0.000e+00 Median :0.000e+00 Median :0.000e+00
## Mean :1.469e-06 Mean :2.547e-05 Mean :3.480e-06
## 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00
## Max. :2.571e-03 Max. :3.115e-02 Max. :1.825e-03
##
## facility_other facility_unknown fruit_vegetables golfcourse
## Min. :0.0000000 Min. :0.0000000 Min. :0 Min. :0
## 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0
## Median :0.0000000 Median :0.0000000 Median :0 Median :0
## Mean :0.0007405 Mean :0.0001777 Mean :0 Mean :0
## 3rd Qu.:0.0000000 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:0
## Max. :0.2009920 Max. :0.1379450 Max. :0 Max. :0
##
## greenspace grvl_sand_pit harvest_area
## Min. :0.000e+00 Min. :0.000000 Min. :0.00000
## 1st Qu.:0.000e+00 1st Qu.:0.000000 1st Qu.:0.00000
## Median :0.000e+00 Median :0.000000 Median :0.00000
## Mean :1.466e-05 Mean :0.001888 Mean :0.04720
## 3rd Qu.:0.000e+00 3rd Qu.:0.000000 3rd Qu.:0.03969
## Max. :3.028e-03 Max. :0.557858 Max. :0.83674
##
## harvest_area_white_zone interchange_ramp lagoon landfill
## Min. :0.0000000 Min. :0 Min. :0.0000000 Min. :0
## 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0.0000000 1st Qu.:0
## Median :0.0000000 Median :0 Median :0.0000000 Median :0
## Mean :0.0002387 Mean :0 Mean :0.0001343 Mean :0
## 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:0.0000000 3rd Qu.:0
## Max. :0.0543438 Max. :0 Max. :0.0218390 Max. :0
##
## lc_class110 lc_class120 lc_class20 lc_class210
## Min. :0.000000 Min. :0.0000000 Min. :0.000000 Min. :0.0000
## 1st Qu.:0.009894 1st Qu.:0.0000000 1st Qu.:0.000000 1st Qu.:0.2869
## Median :0.036593 Median :0.0000000 Median :0.002998 Median :0.5748
## Mean :0.046553 Mean :0.0008668 Mean :0.028712 Mean :0.5346
## 3rd Qu.:0.061983 3rd Qu.:0.0000000 3rd Qu.:0.032240 3rd Qu.:0.7873
## Max. :0.731887 Max. :0.1654590 Max. :0.519648 Max. :1.0000
##
## lc_class220 lc_class230 lc_class32 lc_class33
## Min. :0.000000 Min. :0.00000 Min. :0.000e+00 Min. :0.000000
## 1st Qu.:0.008999 1st Qu.:0.02228 1st Qu.:0.000e+00 1st Qu.:0.000000
## Median :0.071814 Median :0.06057 Median :0.000e+00 Median :0.000000
## Mean :0.140469 Mean :0.12188 Mean :1.163e-05 Mean :0.003658
## 3rd Qu.:0.230000 3rd Qu.:0.17370 3rd Qu.:0.000e+00 3rd Qu.:0.000000
## Max. :1.000000 Max. :0.93217 Max. :1.176e-02 Max. :0.324028
##
## lc_class34 lc_class50 low_impact_seismic mill
## Min. :0.00000 Min. :0.00000 Min. :0.000000 Min. :0
## 1st Qu.:0.00000 1st Qu.:0.01039 1st Qu.:0.000000 1st Qu.:0
## Median :0.01762 Median :0.04435 Median :0.000000 Median :0
## Mean :0.03201 Mean :0.09125 Mean :0.005522 Mean :0
## 3rd Qu.:0.04129 3rd Qu.:0.10789 3rd Qu.:0.004557 3rd Qu.:0
## Max. :0.55710 Max. :1.00000 Max. :0.087576 Max. :0
##
## mines_oilsands mines_pitlake misc_oil_gas_facility oil_gas_plant
## Min. :0.0000000 Min. :0 Min. :0.0000000 Min. :0.000000
## 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0.0000000 1st Qu.:0.000000
## Median :0.0000000 Median :0 Median :0.0000000 Median :0.000000
## Mean :0.0005971 Mean :0 Mean :0.0031465 Mean :0.001106
## 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:0.0006169 3rd Qu.:0.000000
## Max. :0.1223456 Max. :0 Max. :0.3449713 Max. :0.175037
##
## open_pit_mine peat pipeline recreation
## Min. :0.0000000 Min. :0 Min. :0.00000 Min. :0
## 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0.00000 1st Qu.:0
## Median :0.0000000 Median :0 Median :0.01158 Median :0
## Mean :0.0005218 Mean :0 Mean :0.01810 Mean :0
## 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:0.02619 3rd Qu.:0
## Max. :0.0641059 Max. :0 Max. :0.28896 Max. :0
##
## reservoir residence_clearing ris_airp_runway ris_borrowpits
## Min. :0.000e+00 Min. :0.000e+00 Min. :0 Min. :0.000e+00
## 1st Qu.:0.000e+00 1st Qu.:0.000e+00 1st Qu.:0 1st Qu.:0.000e+00
## Median :0.000e+00 Median :0.000e+00 Median :0 Median :0.000e+00
## Mean :8.339e-06 Mean :7.892e-06 Mean :0 Mean :1.984e-05
## 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00 3rd Qu.:0 3rd Qu.:0.000e+00
## Max. :7.894e-03 Max. :3.113e-03 Max. :0 Max. :5.063e-03
##
## ris_camp_industrial ris_clearing_unknown ris_drainage
## Min. :0 Min. :0.0000000 Min. :0.000e+00
## 1st Qu.:0 1st Qu.:0.0000000 1st Qu.:0.000e+00
## Median :0 Median :0.0000000 Median :0.000e+00
## Mean :0 Mean :0.0002653 Mean :5.813e-05
## 3rd Qu.:0 3rd Qu.:0.0000000 3rd Qu.:0.000e+00
## Max. :0 Max. :0.0493557 Max. :1.682e-02
##
## ris_facility_operations ris_facility_unknown ris_mines_oilsands
## Min. :0.0000000 Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:0.0000000 1st Qu.:0.000e+00 1st Qu.:0.000e+00
## Median :0.0000000 Median :0.000e+00 Median :0.000e+00
## Mean :0.0002401 Mean :3.345e-08 Mean :5.357e-05
## 3rd Qu.:0.0000000 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00
## Max. :0.1274343 Max. :2.780e-05 Max. :5.667e-02
##
## ris_oilsands_rms ris_overburden_dump ris_plant ris_reclaim_ready
## Min. :0.0000000 Min. :0.000e+00 Min. :0 Min. :0
## 1st Qu.:0.0000000 1st Qu.:0.000e+00 1st Qu.:0 1st Qu.:0
## Median :0.0000000 Median :0.000e+00 Median :0 Median :0
## Mean :0.0001467 Mean :9.603e-05 Mean :0 Mean :0
## 3rd Qu.:0.0000000 3rd Qu.:0.000e+00 3rd Qu.:0 3rd Qu.:0
## Max. :0.0334971 Max. :2.111e-02 Max. :0 Max. :0
##
## ris_reclaimed_certified ris_reclaimed_permanent ris_reclaimed_temp
## Min. :0 Min. :0.0000000 Min. :0.0000000
## 1st Qu.:0 1st Qu.:0.0000000 1st Qu.:0.0000000
## Median :0 Median :0.0000000 Median :0.0000000
## Mean :0 Mean :0.0004046 Mean :0.0001344
## 3rd Qu.:0 3rd Qu.:0.0000000 3rd Qu.:0.0000000
## Max. :0 Max. :0.0534939 Max. :0.0476953
##
## ris_road ris_soil_replaced ris_soil_salvaged
## Min. :0.0000000 Min. :0.0000000 Min. :0.0000000
## 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0.0000000
## Median :0.0000000 Median :0.0000000 Median :0.0000000
## Mean :0.0001202 Mean :0.0001057 Mean :0.0000938
## 3rd Qu.:0.0000000 3rd Qu.:0.0000000 3rd Qu.:0.0000000
## Max. :0.0218055 Max. :0.0244751 Max. :0.0414762
##
## ris_tailing_pond ris_tank_farm ris_transmission_line ris_utilities
## Min. :0.0000000 Min. :0 Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0.000e+00 1st Qu.:0.000e+00
## Median :0.0000000 Median :0 Median :0.000e+00 Median :0.000e+00
## Mean :0.0007656 Mean :0 Mean :6.526e-06 Mean :5.082e-06
## 3rd Qu.:0.0000000 3rd Qu.:0 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00
## Max. :0.1738171 Max. :0 Max. :2.667e-03 Max. :2.539e-03
##
## ris_waste ris_windrow rlwy_dbl_track rlwy_mlt_track
## Min. :0 Min. :0.000e+00 Min. :0 Min. :0
## 1st Qu.:0 1st Qu.:0.000e+00 1st Qu.:0 1st Qu.:0
## Median :0 Median :0.000e+00 Median :0 Median :0
## Mean :0 Mean :2.231e-05 Mean :0 Mean :0
## 3rd Qu.:0 3rd Qu.:0.000e+00 3rd Qu.:0 3rd Qu.:0
## Max. :0 Max. :1.595e-02 Max. :0 Max. :0
##
## rlwy_sgl_track rlwy_spur road_gravel_1l road_gravel_2l
## Min. :0.00e+00 Min. :0 Min. :0.000000 Min. :0.0000000
## 1st Qu.:0.00e+00 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0.0000000
## Median :0.00e+00 Median :0 Median :0.001385 Median :0.0000000
## Mean :2.92e-05 Mean :0 Mean :0.002913 Mean :0.0011075
## 3rd Qu.:0.00e+00 3rd Qu.:0 3rd Qu.:0.003689 3rd Qu.:0.0004745
## Max. :2.44e-02 Max. :0 Max. :0.038085 Max. :0.0438815
##
## road_paved_1l road_paved_2l road_paved_3l road_paved_4l road_paved_5l
## Min. :0 Min. :0 Min. :0 Min. :0 Min. :0
## 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :0 Median :0 Median :0 Median :0 Median :0
## Mean :0 Mean :0 Mean :0 Mean :0 Mean :0
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :0 Max. :0 Max. :0 Max. :0 Max. :0
##
## road_paved_div road_paved_undiv_1l road_paved_undiv_2l
## Min. :0.000e+00 Min. :0.000e+00 Min. :0.0000000
## 1st Qu.:0.000e+00 1st Qu.:0.000e+00 1st Qu.:0.0000000
## Median :0.000e+00 Median :0.000e+00 Median :0.0000000
## Mean :4.504e-06 Mean :7.514e-05 Mean :0.0005082
## 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00 3rd Qu.:0.0000000
## Max. :1.936e-03 Max. :2.147e-02 Max. :0.0431664
##
## road_unclassified road_unimproved road_unpaved_1l road_unpaved_2l
## Min. :0.000e+00 Min. :0.0000000 Min. :0 Min. :0
## 1st Qu.:0.000e+00 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0
## Median :0.000e+00 Median :0.0003318 Median :0 Median :0
## Mean :4.093e-06 Mean :0.0016662 Mean :0 Mean :0
## 3rd Qu.:0.000e+00 3rd Qu.:0.0018760 3rd Qu.:0 3rd Qu.:0
## Max. :8.613e-04 Max. :0.0532898 Max. :0 Max. :0
##
## road_winter rough_pasture runway rural_residence
## Min. :0 Min. :0.0000000 Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:0 1st Qu.:0.0000000 1st Qu.:0.000e+00 1st Qu.:0.000e+00
## Median :0 Median :0.0000000 Median :0.000e+00 Median :0.000e+00
## Mean :0 Mean :0.0002038 Mean :3.529e-05 Mean :5.307e-05
## 3rd Qu.:0 3rd Qu.:0.0000000 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00
## Max. :0 Max. :0.0828324 Max. :1.525e-02 Max. :2.805e-02
##
## sump surrounding_veg tailing_pond tame_pasture
## Min. :0.000000 Min. :0.000e+00 Min. :0.000e+00 Min. :0.0000000
## 1st Qu.:0.000000 1st Qu.:0.000e+00 1st Qu.:0.000e+00 1st Qu.:0.0000000
## Median :0.000000 Median :0.000e+00 Median :0.000e+00 Median :0.0000000
## Mean :0.002142 Mean :9.553e-05 Mean :1.353e-05 Mean :0.0008195
## 3rd Qu.:0.001785 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00 3rd Qu.:0.0000000
## Max. :0.311103 Max. :8.209e-02 Max. :4.008e-03 Max. :0.1636895
##
## trail transfer_station transmission_line truck_trail
## Min. :0.0000000 Min. :0 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.0001209 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0.000000
## Median :0.0007039 Median :0 Median :0.000000 Median :0.000000
## Mean :0.0010490 Mean :0 Mean :0.004601 Mean :0.000609
## 3rd Qu.:0.0015517 3rd Qu.:0 3rd Qu.:0.004977 3rd Qu.:0.000398
## Max. :0.0197691 Max. :0 Max. :0.173950 Max. :0.038651
##
## urban_industrial urban_residence vegetated_edge_railways
## Min. :0.000000 Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:0.000000 1st Qu.:0.000e+00 1st Qu.:0.000e+00
## Median :0.000000 Median :0.000e+00 Median :0.000e+00
## Mean :0.001092 Mean :4.099e-05 Mean :8.976e-05
## 3rd Qu.:0.000000 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00
## Max. :0.335749 Max. :1.157e-02 Max. :1.271e-01
##
## vegetated_edge_roads well_aband well_bitumen
## Min. :0.000000 Min. :0.0000000 Min. :0.000000
## 1st Qu.:0.002604 1st Qu.:0.0003367 1st Qu.:0.000000
## Median :0.006764 Median :0.0019160 Median :0.000000
## Mean :0.010682 Mean :0.0058542 Mean :0.006039
## 3rd Qu.:0.013869 3rd Qu.:0.0093228 3rd Qu.:0.005144
## Max. :0.147883 Max. :0.3045402 Max. :0.187398
##
## well_cased well_cleared_not_confirmed well_cleared_not_drilled
## Min. :0.0000000 Min. :0.0000000 Min. :0.000e+00
## 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0.000e+00
## Median :0.0000000 Median :0.0000000 Median :0.000e+00
## Mean :0.0005716 Mean :0.0002716 Mean :2.143e-05
## 3rd Qu.:0.0001940 3rd Qu.:0.0000000 3rd Qu.:0.000e+00
## Max. :0.0685807 Max. :0.0829690 Max. :1.469e-02
##
## well_gas well_oil well_other well_unknown
## Min. :0.0000000 Min. :0 Min. :0.000000 Min. :0.000e+00
## 1st Qu.:0.0000000 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0.000e+00
## Median :0.0000000 Median :0 Median :0.000000 Median :0.000e+00
## Mean :0.0003188 Mean :0 Mean :0.001548 Mean :3.274e-05
## 3rd Qu.:0.0001151 3rd Qu.:0 3rd Qu.:0.001006 3rd Qu.:0.000e+00
## Max. :0.0572117 Max. :0 Max. :0.116479 Max. :4.813e-03
##
Let’s save this merged and cleaned file in case someone wants it and will do their own grouping/exploration (e.g., the next steps in this script).
Make sure when naming files we follow the best data managements practices for the ACME lab outlined here.
# save data in data processed folder
write_csv(covariates_fixed,
'data/processed/OSM_covariates_merged_2021_2022.csv')
Now that we’ve merged, cleaned, and reformatted the data we don’t need the list file or messy merged data anymore. Let’s remove these from the environment so we don’t accidentally use them.
rm(covariates_merged,
covariates)
There are too many covariates to include in the models individually and many of them describe similar HFI features.
Now that this section is finalized, we will use the structure outlined in the covariates_table.docx which can be found in the ‘relevant_literature’ folder of this repository for formatting the covariates for this and future related analyses.
The covariate_table and the README file in this repository include descriptions of each feature from the ABMI human footprints wall to wall data download website for Year 2021; which can also be found in the relevant_literature folder of this repository (HFI_2021_v1_0_Metadata_Final.pdf).
As we prepare to lump the covariates together, we may need to reference the column names. Let’s print that now so we have it fresh in the console.
names(covariates_fixed)
## [1] "array" "site"
## [3] "buff_dist" "airp_runway"
## [5] "borrowpit_dry" "borrowpit_wet"
## [7] "borrowpits" "camp_industrial"
## [9] "campground" "canal"
## [11] "cfo" "clearing_unknown"
## [13] "clearing_wellpad_unconfirmed" "conventional_seismic"
## [15] "country_residence" "crop"
## [17] "cultivation_abandoned" "dugout"
## [19] "facility_other" "facility_unknown"
## [21] "fruit_vegetables" "golfcourse"
## [23] "greenspace" "grvl_sand_pit"
## [25] "harvest_area" "harvest_area_white_zone"
## [27] "interchange_ramp" "lagoon"
## [29] "landfill" "lc_class110"
## [31] "lc_class120" "lc_class20"
## [33] "lc_class210" "lc_class220"
## [35] "lc_class230" "lc_class32"
## [37] "lc_class33" "lc_class34"
## [39] "lc_class50" "low_impact_seismic"
## [41] "mill" "mines_oilsands"
## [43] "mines_pitlake" "misc_oil_gas_facility"
## [45] "oil_gas_plant" "open_pit_mine"
## [47] "peat" "pipeline"
## [49] "recreation" "reservoir"
## [51] "residence_clearing" "ris_airp_runway"
## [53] "ris_borrowpits" "ris_camp_industrial"
## [55] "ris_clearing_unknown" "ris_drainage"
## [57] "ris_facility_operations" "ris_facility_unknown"
## [59] "ris_mines_oilsands" "ris_oilsands_rms"
## [61] "ris_overburden_dump" "ris_plant"
## [63] "ris_reclaim_ready" "ris_reclaimed_certified"
## [65] "ris_reclaimed_permanent" "ris_reclaimed_temp"
## [67] "ris_road" "ris_soil_replaced"
## [69] "ris_soil_salvaged" "ris_tailing_pond"
## [71] "ris_tank_farm" "ris_transmission_line"
## [73] "ris_utilities" "ris_waste"
## [75] "ris_windrow" "rlwy_dbl_track"
## [77] "rlwy_mlt_track" "rlwy_sgl_track"
## [79] "rlwy_spur" "road_gravel_1l"
## [81] "road_gravel_2l" "road_paved_1l"
## [83] "road_paved_2l" "road_paved_3l"
## [85] "road_paved_4l" "road_paved_5l"
## [87] "road_paved_div" "road_paved_undiv_1l"
## [89] "road_paved_undiv_2l" "road_unclassified"
## [91] "road_unimproved" "road_unpaved_1l"
## [93] "road_unpaved_2l" "road_winter"
## [95] "rough_pasture" "runway"
## [97] "rural_residence" "sump"
## [99] "surrounding_veg" "tailing_pond"
## [101] "tame_pasture" "trail"
## [103] "transfer_station" "transmission_line"
## [105] "truck_trail" "urban_industrial"
## [107] "urban_residence" "vegetated_edge_railways"
## [109] "vegetated_edge_roads" "well_aband"
## [111] "well_bitumen" "well_cased"
## [113] "well_cleared_not_confirmed" "well_cleared_not_drilled"
## [115] "well_gas" "well_oil"
## [117] "well_other" "well_unknown"
Now we will use the mutate() function with some
tidyverse trickery (i.e., nesting across() and
contains() in rowsums()) to sum across each
observation (row) by searching for various character strings. If there
isn’t a common character string for multiple variables we want to sum
then we provide each one individually. We can also combine these methods
(e.g., with ‘facilities’ [see code]).
covariates_grouped <- covariates_fixed %>%
# rename 'vegetated_edge_roads so that we can use road as keyword to group roads without including this feature
rename('vegetated_edge_rds' = vegetated_edge_roads) %>%
# within the mutate function create new column names for the grouped variables
mutate(
# borrowpits
borrowpits = rowSums(across(contains('borrowpit'))) + # here we use rowsums with across() and contains() to sum acrross each row any values for columns that contain the keyword above. Be careful when using that there aren't any variables that match the string (keyword) provided that you don't want to include!
dugout +
lagoon +
sump,
# clearings
clearings = rowSums(across(contains('clearing'))) +
runway,
# cultivations
cultivation = crop +
cultivation_abandoned +
fruit_vegetables +
rough_pasture +
tame_pasture,
# harvest areas
harvest = rowSums(across(contains('harvest'))),
# industrial facilities
facilities = rowSums(across(contains('facility'))) +
rowSums(across(contains('plant'))) +
camp_industrial +
mill +
ris_camp_industrial +
ris_tank_farm +
ris_utilities +
urban_industrial,
# mine areas
mines = rowSums(across(contains('mine'))) +
rowSums(across(contains('tailing'))) +
grvl_sand_pit +
peat +
ris_drainage +
ris_oilsands_rms +
ris_overburden_dump +
ris_reclaim_ready +
ris_soil_salvaged +
ris_waste,
# railways
railways = rowSums(across(contains('rlwy'))),
# reclaimed areas
reclaimed = rowSums(across(contains('reclaimed'))) +
ris_soil_replaced +
ris_windrow,
# recreation areas
recreation = campground +
golfcourse +
greenspace +
recreation,
# residential areas (can't use residence as keyword because 'residence_clearing' is in clearing unless we rearrange groupings or rename that one)
residential = country_residence +
rural_residence +
urban_residence,
# roads (we renamed 'vegetated_edge_roads' above to 'vegetated_edge_rds' so we can use roads as keyword here which saves a bunch of coding as there are many many road variables)
roads = rowSums(across(contains('road'))) +
interchange_ramp +
airp_runway +
ris_airp_runway +
transfer_station,
# seismic lines
seismic_lines = conventional_seismic,
# 3D sesimic lines (put the 3D at the end though to make R happy)
seismic_lines_3D = low_impact_seismic,
# transmission lines
transmission_lines = rowSums(across(contains('transmission'))),
# trails
trails = rowSums(across(contains('trail'))),
# vegetated edges
veg_edges = rowSums(across(contains('vegetated'))) +
surrounding_veg,
# man-made water features
water = canal +
reservoir,
# well sites (this probably includes 'clearing_wellpad' need to check)
wells = rowSums(across(contains('well'))),
# remove columns that were used to create new columns to tidy the data frame
.keep = 'unused') %>%
# reorder alphabetically except array, site and buff_dist
select(order(colnames(.))) %>%
# we want to move the columns that aren't HFI features or landcover to the front
relocate(.,
c(array,
site,
buff_dist)) %>%
# reorder variables so the veg data is after all the HFI data
relocate(starts_with('lc_class'),
.after = wells)
# see what's left
names(covariates_grouped)
## [1] "array" "site" "buff_dist"
## [4] "borrowpits" "cfo" "clearings"
## [7] "cultivation" "facilities" "harvest"
## [10] "landfill" "mines" "pipeline"
## [13] "railways" "reclaimed" "recreation"
## [16] "residential" "roads" "seismic_lines"
## [19] "seismic_lines_3D" "trails" "transmission_lines"
## [22] "veg_edges" "water" "wells"
## [25] "lc_class110" "lc_class120" "lc_class20"
## [28] "lc_class210" "lc_class220" "lc_class230"
## [31] "lc_class32" "lc_class33" "lc_class34"
## [34] "lc_class50"
# check the structure of new data
str(covariates_grouped)
## tibble [4,660 Ă— 34] (S3: tbl_df/tbl/data.frame)
## $ array : Factor w/ 6 levels "LU13","LU15",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ site : Factor w/ 233 levels "LU13_18","LU13_15",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ buff_dist : Factor w/ 20 levels "250","500","750",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ borrowpits : num [1:4660] 0 0 0 0 0 ...
## $ cfo : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ clearings : num [1:4660] 0.0923 0.0697 0 0 0 ...
## $ cultivation : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ facilities : num [1:4660] 0.291 0 0 0 0 ...
## $ harvest : num [1:4660] 0 0 0.687 0.337 0 ...
## $ landfill : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ mines : num [1:4660] 0 0.0873 0 0 0 ...
## $ pipeline : num [1:4660] 0 0.068 0 0 0.0301 ...
## $ railways : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ reclaimed : num [1:4660] 0 0.0477 0 0 0 ...
## $ recreation : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ residential : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ roads : num [1:4660] 0 0.0174 0 0 0 ...
## $ seismic_lines : num [1:4660] 0 0.03277 0 0.00889 0.01144 ...
## $ seismic_lines_3D : num [1:4660] 0 0 0 0 0.0523 ...
## $ trails : num [1:4660] 0.00588 0.0028 0 0.01591 0 ...
## $ transmission_lines: num [1:4660] 0.0642 0 0 0 0.091 ...
## $ veg_edges : num [1:4660] 0 0.0858 0 0 0 ...
## $ water : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ wells : num [1:4660] 0 0 0 0 0.0322 ...
## $ lc_class110 : num [1:4660] 0.193 0.348 0 0 0.178 ...
## $ lc_class120 : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ lc_class20 : num [1:4660] 0.0361 0 0 0 0 ...
## $ lc_class210 : num [1:4660] 0.456 0.358 0.186 1 0.822 ...
## $ lc_class220 : num [1:4660] 0 0 0 0 0 ...
## $ lc_class230 : num [1:4660] 0 0.101 0.255 0 0 ...
## $ lc_class32 : num [1:4660] 0 0 0 0 0 0 0 0 0 0 ...
## $ lc_class33 : num [1:4660] 0 0.101 0 0 0 ...
## $ lc_class34 : num [1:4660] 0 0.0916 0 0 0 ...
## $ lc_class50 : num [1:4660] 0.316 0 0.559 0 0 ...
# check summary of new data
summary(covariates_grouped)
## array site buff_dist borrowpits
## LU13:820 LU13_18: 20 250 : 233 Min. :0.000000
## LU15:780 LU13_15: 20 500 : 233 1st Qu.:0.000000
## LU21:720 LU13_03: 20 750 : 233 Median :0.001334
## LU01:780 LU13_34: 20 1000 : 233 Mean :0.004175
## LU2 :840 LU13_57: 20 1250 : 233 3rd Qu.:0.004419
## LU3 :720 LU13_16: 20 1500 : 233 Max. :0.311103
## (Other):4540 (Other):3262
## cfo clearings cultivation facilities
## Min. :0.000e+00 Min. :0.0000000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.000e+00 1st Qu.:0.0000000 1st Qu.:0.00000 1st Qu.:0.000000
## Median :0.000e+00 Median :0.0004464 Median :0.00000 Median :0.000000
## Mean :5.398e-07 Mean :0.0051266 Mean :0.00105 Mean :0.007104
## 3rd Qu.:0.000e+00 3rd Qu.:0.0036890 3rd Qu.:0.00000 3rd Qu.:0.003121
## Max. :1.217e-03 Max. :0.4023522 Max. :0.18015 Max. :0.466010
##
## harvest landfill mines pipeline
## Min. :0.00000 Min. :0 Min. :0.000000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0 1st Qu.:0.000000 1st Qu.:0.00000
## Median :0.00000 Median :0 Median :0.000000 Median :0.01158
## Mean :0.04744 Mean :0 Mean :0.004234 Mean :0.01810
## 3rd Qu.:0.04085 3rd Qu.:0 3rd Qu.:0.000000 3rd Qu.:0.02619
## Max. :0.83674 Max. :0 Max. :0.557858 Max. :0.28896
##
## railways reclaimed recreation residential
## Min. :0.00e+00 Min. :0.000000 Min. :0.0000000 Min. :0.0000000
## 1st Qu.:0.00e+00 1st Qu.:0.000000 1st Qu.:0.0000000 1st Qu.:0.0000000
## Median :0.00e+00 Median :0.000000 Median :0.0000000 Median :0.0000000
## Mean :2.92e-05 Mean :0.000667 Mean :0.0001176 Mean :0.0001549
## 3rd Qu.:0.00e+00 3rd Qu.:0.000000 3rd Qu.:0.0000000 3rd Qu.:0.0000000
## Max. :2.44e-02 Max. :0.078325 Max. :0.0289661 Max. :0.0280500
##
## roads seismic_lines seismic_lines_3D trails
## Min. :0.000000 Min. :0.000000 Min. :0.000000 Min. :0.0000000
## 1st Qu.:0.002264 1st Qu.:0.003485 1st Qu.:0.000000 1st Qu.:0.0002191
## Median :0.004540 Median :0.006323 Median :0.000000 Median :0.0011395
## Mean :0.006399 Mean :0.006592 Mean :0.005522 Mean :0.0016580
## 3rd Qu.:0.008663 3rd Qu.:0.009171 3rd Qu.:0.004557 3rd Qu.:0.0022439
## Max. :0.071812 Max. :0.045512 Max. :0.087576 Max. :0.0386512
##
## transmission_lines veg_edges water wells
## Min. :0.000000 Min. :0.000000 Min. :0.000e+00 Min. :0.0000000
## 1st Qu.:0.000000 1st Qu.:0.002612 1st Qu.:0.000e+00 1st Qu.:0.0007196
## Median :0.000000 Median :0.006847 Median :0.000e+00 Median :0.0061983
## Mean :0.004607 Mean :0.010867 Mean :2.504e-05 Mean :0.0150163
## 3rd Qu.:0.004977 3rd Qu.:0.014110 3rd Qu.:0.000e+00 3rd Qu.:0.0177990
## Max. :0.173950 Max. :0.156249 Max. :1.961e-02 Max. :0.3045402
##
## lc_class110 lc_class120 lc_class20 lc_class210
## Min. :0.000000 Min. :0.0000000 Min. :0.000000 Min. :0.0000
## 1st Qu.:0.009894 1st Qu.:0.0000000 1st Qu.:0.000000 1st Qu.:0.2869
## Median :0.036593 Median :0.0000000 Median :0.002998 Median :0.5748
## Mean :0.046553 Mean :0.0008668 Mean :0.028712 Mean :0.5346
## 3rd Qu.:0.061983 3rd Qu.:0.0000000 3rd Qu.:0.032240 3rd Qu.:0.7873
## Max. :0.731887 Max. :0.1654590 Max. :0.519648 Max. :1.0000
##
## lc_class220 lc_class230 lc_class32 lc_class33
## Min. :0.000000 Min. :0.00000 Min. :0.000e+00 Min. :0.000000
## 1st Qu.:0.008999 1st Qu.:0.02228 1st Qu.:0.000e+00 1st Qu.:0.000000
## Median :0.071814 Median :0.06057 Median :0.000e+00 Median :0.000000
## Mean :0.140469 Mean :0.12188 Mean :1.163e-05 Mean :0.003658
## 3rd Qu.:0.230000 3rd Qu.:0.17370 3rd Qu.:0.000e+00 3rd Qu.:0.000000
## Max. :1.000000 Max. :0.93217 Max. :1.176e-02 Max. :0.324028
##
## lc_class34 lc_class50
## Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.01039
## Median :0.01762 Median :0.04435
## Mean :0.03201 Mean :0.09125
## 3rd Qu.:0.04129 3rd Qu.:0.10789
## Max. :0.55710 Max. :1.00000
##
# there are some NAs in the data which will cause problems with modeling/visualization of data ignore for now but will explore these sites specifically after report
covariates_grouped <- covariates_grouped %>%
# remove rows with NAs
na.omit()
Let’s look at the histograms again and see if we need to remove any features or feature groups without enough data
# use for loop to plot histograms for all covariates
for (col in 5:ncol(covariates_grouped)) {
hist(covariates_grouped[,col])
}
> IMO we don’t have enough variation in data to use the following
features/feature groups
Also, there’s not a lot of data for the following features, which are similar and of interest to OSM, so in the past they’ve been grouped together and we will here as well
For this analysis we will also combine facilities and mines
So let’s modify this data and remove those features for now this step will need to be changed each year likely
Let’s also rename the landcover classes so they make more sense without having to look them up by number (maybe should add this to script earlier for next year)
covariates_grouped <- covariates_grouped %>%
# create column osm_industrial
mutate(
osm_industrial = borrowpits +
clearings +
facilities +
mines,
# remove columns we used to make this variable
.keep = 'unused') %>%
# remove other features we don't need
select(!c(cfo,
cultivation,
reclaimed,
recreation,
residential,
water,
lc_class20,
lc_class120,
lc_class32,
lc_class33,
landfill,
railways)) %>%
# rename landcover classes
rename(
grassland = lc_class110,
coniferous = lc_class210,
broadleaf = lc_class220,
mixed = lc_class230,
developed = lc_class34,
shrub = lc_class50)
# check that it worked
names(covariates_grouped)
## [1] "array" "site" "buff_dist"
## [4] "harvest" "pipeline" "roads"
## [7] "seismic_lines" "seismic_lines_3D" "trails"
## [10] "transmission_lines" "veg_edges" "wells"
## [13] "grassland" "coniferous" "broadleaf"
## [16] "mixed" "developed" "shrub"
## [19] "osm_industrial"
Let’s save this data now that it’s all formatted and grouped.
write_csv(covariates_grouped,
'data/processed/OSM_covariates_grouped_2021_2022.csv')
Let’s remove the data frames we no longer need.
rm(covariates_fixed)
We need to subset the data so we have separate data frames for each buffer width to work with in the analysis AND to explore correlation between variables at each buffer width, as these may very with spatial scales
Let’s use a for loop to subset the data
buffer_frames <- list()
for (i in unique(covariates_grouped$buff_dist)){
print(i)
# Subset data based on radius
df <- covariates_grouped %>%
filter(buff_dist == i)
# list of dataframes
buffer_frames <-c (buffer_frames, list(df))
}
## [1] "250"
## [1] "500"
## [1] "750"
## [1] "1000"
## [1] "1250"
## [1] "1500"
## [1] "1750"
## [1] "2000"
## [1] "2250"
## [1] "2500"
## [1] "2750"
## [1] "3000"
## [1] "3250"
## [1] "3500"
## [1] "3750"
## [1] "4000"
## [1] "4250"
## [1] "4500"
## [1] "4750"
## [1] "5000"
# name list objects so we can extract names for plotting
buffer_frames <- buffer_frames %>%
# absurdly long way to do this but for sake of time fuck it
purrr::set_names('250 meter buffer',
'500 meter buffer',
'750 meter buffer',
'1000 meter buffer',
'1250 meter buffer',
'1500 meter buffer',
'1750 meter buffer',
'2000 meter buffer',
'2250 meter buffer',
'2500 meter buffer',
'2750 meter buffer',
'3000 meter buffer',
'3250 meter buffer',
'3500 meter buffer',
'3750 meter buffer',
'4000 meter buffer',
'4250 meter buffer',
'4500 meter buffer',
'4750 meter buffer',
'5000 meter buffer')
Now we have a list with data frames for each buffer width which we can work with later.
We will need to repeat this step in the analysis script
Now we need to make correlation plots for each buffer width to see
what variables are correlated at a given spatial scale. We can use
purrr::map() with the chart.Correlation()
function from the PerformanceAnalytics package to make
correlation plots with a specified method (e.g., pearson, spearman,
etc.) That also show histograms and scatterplots of each variable.
correlation_plots <- buffer_frames %>%
purrr::map(
~.x %>%
# select numeric variables only since we can't compute a r2 for non-numeric
select_if(is.numeric) %>%
# use chart.correlation
chart.Correlation(.,
histogram = TRUE,
method = "pearson")
)
There is a section for each buffer width outlining the variables that are autocorrelated and thus should not be included in the same model, it includes the r2 as well
buffer_frames$`250 meter buffer` %>%
select_if(is.numeric) %>%
# use chart.correlation
chart.Correlation(.,
histogram = TRUE,
method = "pearson")
mtext('250 meter buffer', side = 3, line = 3)
buffer_frames$`500 meter buffer` %>%
select_if(is.numeric) %>%
# use chart.correlation
chart.Correlation(.,
histogram = TRUE,
method = "pearson")
mtext('500 meter buffer', side = 3, line = 3)
add more to this section in later when we have more time to explore the covariates and choose which should be inlcuded etc.
# use this code to change figure margins otherwise will not plot because figure margines are too large
par(mar=c(1,1,1,1))
# now use purrr to plot histograms for all remaining HFI variables for each buffer
hfi_histograms <- buffer_frames %>%
purrr::imap(
~.x %>%
# filter to just the HFI variables
select(where(is.numeric) &
! starts_with('lc_class')) %>%
# pipe into hist.data.frame function to make histograms for each variable
hist.data.frame(mtitl = paste0('Histograms of HFI variables at ', .y)))
Now let’s do the same thing with the landcover variables
lc_histograms <- buffer_frames %>%
purrr::imap(
~.x %>%
# filter to just the landcover variables
select(where(is.numeric) &
starts_with('lc_class')) %>%
# pipe into hist.data.frame function to make histograms for each variable
hist.data.frame(mtitl = paste0('Histograms of landcover variables at ', .y)))